Skip to content

Revert "[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache" (#30681)#36076

Draft
zhewenl wants to merge 1 commit intovllm-project:mainfrom
zhewenl:auto-revert/pr-30681
Draft

Revert "[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache" (#30681)#36076
zhewenl wants to merge 1 commit intovllm-project:mainfrom
zhewenl:auto-revert/pr-30681

Conversation

@zhewenl
Copy link
Copy Markdown
Collaborator

@zhewenl zhewenl commented Mar 5, 2026

Revert of #30681

This reverts the merge commit for PR #30681 which replaced torch.cuda.empty_cache with torch.accelerator.empty_cache across the codebase.

Reason

This PR is linked to 1 new CI failure in nightly build #54530:

  • Distributed Tests (4 GPUs)test_torchrun_example_moe.py fails with KV cache memory error: available memory 0.49 GiB < needed 0.50 GiB. The replacement of torch.cuda.empty_cache with torch.accelerator.empty_cache may affect GPU memory reclamation behavior, causing this marginal shortfall.

Auto-generated

This revert PR was auto-generated by the CI failure analyzer. Please review before merging.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 5, 2026

Documentation preview: https://vllm--36076.org.readthedocs.build/en/36076/

@mergify mergify bot added documentation Improvements or additions to documentation performance Performance-related issues nvidia structured-output v1 labels Mar 5, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts a previous change that replaced torch.cuda.empty_cache with torch.accelerator.empty_cache, which caused CI failures. The revert is mostly mechanical, but in some platform-agnostic files, it correctly uses a platform abstraction (current_platform.empty_cache()) instead of hardcoding torch.cuda.empty_cache. This is a good improvement. However, I've identified a critical issue in vllm/v1/worker/xpu_model_runner.py where a monkey-patch is not reverted, potentially leading to side effects.

if supports_xpu_graph():
torch.cuda.graph = torch.xpu.graph
torch.cuda.CUDAGraph = torch.xpu.XPUGraph
torch.cuda.empty_cache = torch.xpu.empty_cache
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This monkey-patch is not reverted in the finally block, making it permanent for the process. This can cause unexpected behavior if other parts of the code expect the original torch.cuda.empty_cache. The same issue exists for torch.cuda.graph and torch.cuda.CUDAGraph from the original code. A context manager should restore the original state upon exit. Please save the original attributes before patching and restore them in the finally block.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 5, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhewenl.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 5, 2026
@jikunshang
Copy link
Copy Markdown
Collaborator

jikunshang commented Mar 9, 2026

#30681 run all test. Distributed Tests (4 GPUs) passed see https://buildkite.com/vllm/ci/builds/54293#019cb62a-38d7-4e77-a962-89d3fb0de589
while strange thing is this is running ibm-research/PowerMoE-3b instead of microsoft/Phi-mini-MoE-instruct
oh sorry, i didn't check full log. it shows that only have some chat template error.

my pr log:

[2026-03-04T00:25:02Z] INFO 03-04 00:25:02 [decorators.py:588] saved AOT compiled function to /root/.cache/vllm/torch_compile_cache/torch_aot_compile/0da436ac5f91ca7287450564ca8ac58a52973ee129f411ac065de87300f1e07d/rank_0_0/model
[2026-03-04T00:25:02Z] INFO 03-04 00:25:02 [gpu_worker.py:424] Available KV cache memory: 4.87 GiB
[2026-03-04T00:25:02Z] INFO 03-04 00:25:02 [kv_cache_utils.py:1314] GPU KV cache size: 21,888 tokens


@mergify mergify bot added the intel-gpu Related to Intel GPU label Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation intel-gpu Related to Intel GPU needs-rebase nvidia performance Performance-related issues structured-output v1

Projects

Status: No status
Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants